homework 4, version 1

20.7 μs

Submission by: Ayush Goel (ayush@mit.edu)

3.7 ms

Homework 4: Epidemic modeling I

18.S191, fall 2020

This notebook contains built-in, live answer checks! In some exercises you will see a coloured box, which runs a test case on your code, and provides feedback based on the result. Simply edit the code, run it, and the check runs again.

For MIT students: there will also be some additional (secret) test cases that will be run as part of the grading process, and we will look at your notebook and write comments.

Feel free to ask questions!

15.8 μs
student
39.0 ns

Let's create a package environment:

3.4 μs
38.7 ms
33.4 s
3.3 μs





38.0 ns

Exercise 1: Modelling recovery

In this exercise we will investigate a simple stochastic (probabilistic) model of recovery from an infection and the time τ needed to recover. Although this model can be easily studied analytically using probability theory, we will instead use computational methods. (If you know about this distribution already, try to ignore what you know about it!)

In this model, an individual who is infected has a constant probability p to recover each day. If they recover on day n then τ takes the value n. Each time we run a new experiment τ will take on different values, so τ is a (discrete) random variable. We thus need to study statistical properties of τ, such as its mean and its probability distribution.

Exercise 1.1 - Probability distributions

👉 Define the function bernoulli(p), which returns true with probability p and false with probability (1p).

9.2 μs
bernoulli (generic function with 1 method)
41.8 μs

Got it!

Yay ❤

21.5 ms

👉 Write a function recovery_time(p) that returns the time taken until the person recovers.

3.5 μs
recovery_time (generic function with 1 method)
22.7 μs

Got it!

Good job!

46.4 ms

Hint

Remember to always re-use work you have done previously: in this case you should re-use the function bernoulli.

76.8 ms

We should always be aware of special cases (sometimes called "boundary conditions"). Make sure not to run the code with p=0! What would happen in that case? Your code should check for this and throw an ArgumentError as follows:

throw(ArgumentError("..."))  

with a suitable error message.

4.8 μs

👉 What happens for p=1?

2.7 μs
interpretation_of_p_equals_one

result of function will always be 1

7.7 μs

Exercise 1.2

👉 Write a function do_experiment(p, N) that runs the function recovery_time N times and collects the results into a vector.

4.2 μs
do_experiment (generic function with 1 method)
35.2 μs
small_experiment
3.5 μs

Exercise 1.3

👉 Write a function frequencies(data) that calculates and returns the frequencies (i.e. probability distribution) of input data.

The input will be an array of integers, with duplicates, and the result will be a dictionary that maps each occured value to its frequency in the data.

For example,

frequencies([7, 8, 9, 7])

should give

Dict(
    7 => 0.5, 
    8 => 0.25, 
    9 => 0.25
)

As with any probability distribution, it should be normalised to 1, in the sense that the total probability should be 1.

7.1 μs
frequencies (generic function with 1 method)
38.6 μs
74.3 μs

Hint

Do you remember how we worked with dictionaries in Homework 3? You can create an empty dictionary using Dict(). You may want to use either the function haskey or the function get on your dictionary – check the documentation for how to use these functions.

10.4 μs

Let's run an experiment with p=0.25 and N=10,000.

2.9 μs
large_experiment
361 μs

The frequencies dictionary is difficult to interpret on its own, so instead, we will plot it, i.e. plot P(τ=n) against n, where n is the recovery time.

Plots.jl comes with a function bar, which does exactly what we want:

4.3 μs
4.5 s

Great! Feel free to experiment with this function, try giving it a different array as argument. Plots.jl is pretty clever, it even works with an array of strings!

Exercise 1.4

Next, we want to add a new element to our plot: a vertical line. To demonstrate how this works, here we added a vertical line at the maximum value.

To write this function, we first create a base plot, we then modify that plot to add the vertical line, and finally, we return the plot. More on this in the next info box.

8.7 μs
frequencies_plot_with_maximum (generic function with 1 method)
27.9 μs
235 ms

Note about plotting

Plots.jl has an interesting property: a plot is an object, not an action. Functions like plot, bar, histogram don't draw anything on your screen - they just return a Plots.Plot. This is a struct that contains the description of a plot (what data should be plotted in what way?), not the picture.

So a Pluto cell with a single line, plot(1:10), will show a plot, because the result of the function plot is a Plot object, and Pluto just shows the result of a cell.

Modifying plots

Nice plots are often formed by overlaying multiple plots. In Plots.jl, this is done using the modifying functions: plot!, bar!, vline!, etc. These take an extra (first) argument: a previous plot to modify.

For example, to plot the sin, cos and tan functions in the same view, we do:

function sin_cos_plot()
    T = -1.0:0.01:1.0
    
    result = plot(T, sin.(T))
    plot!(result, T, cos.(T))
    plot!(result, T, tan.(T))

    return result
end

💡 This example demonstrates a useful pattern to combine plots:

  1. Create a new plot and store it in a variable

  2. Modify that plot to add more elements

  3. Return the plot

It is recommended that these 3 steps happen within a single cell. This can prevent some strange glitches when re-running cells. There are three ways to group expressions together into a single cell: begin, let and function. More on this later!

5.4 μs

👉 Write the function frequencies_plot_with_mean that calculates the mean recovery time and displays it using a vertical line.

6.9 μs
frequencies_plot_with_mean (generic function with 1 method)
26.2 μs
184 ms

👉 Write an interactive visualization that draws the histogram and mean for p between 0.01 (not 0!) and 1, and N between 1 and 100,000, say. To avoid a naming conflict, call them p_interactive and N_interactive, instead of just p and N.

3.1 μs
0.01
26.7 ms
1
5.8 ms
2.2 ms

As you separately vary p and N, what do you observe about the mean in each case? Does that make sense?

4.2 μs

Exercise 1.5

👉 What shape does the distribution seem to have? Can you verify that by adding a second plot with the expected shape?

4.6 μs
42.6 ms

Exercise 1.6

👉 Use N=10,000 to calculate the mean time τ(p) to recover as a function of p between 0.001 and 1 (say). Plot this relationship.

3.6 μs
mean_plotting (generic function with 1 method)
32.2 μs
678 ms

Exercise 2: Agent-based model for an epidemic outbreak – types

In this and the following exercises we will develop a simple stochastic model for combined infection and recovery in a population, which may exhibit an epidemic outbreak (i.e. a large spike in the number of infectious people). The population is well mixed, i.e. everyone is in contact with everyone else. [An example of this would be a small school or university in which people are constantly moving around and interacting with each other.]

The model is an individual-based or agent-based model: we explicitly keep track of each individual, or agent, in the population and their infection status. For the moment we will not keep track of their position in space; we will just assume that there is some mechanism, not included in the model, by which they interact with other individuals.

Exercise 2.1

Each agent will have its own internal state, modelling its infection status, namely "susceptible", "infectious" or "recovered". We would like to code these as values S, I and R, respectively. One way to do this is using an enumerated type or enum. Variables of this type can take only a pre-defined set of values; the Julia syntax is as follows:

12.0 μs
7.2 ms

We have just defined a new type InfectionStatus, as well as names S, I and R that are the (only) possible values that a variable of this type can take.

👉 Define a variable test_status whose value is S.

3.4 μs
test_status
S::InfectionStatus = 0
39.0 ns

👉 Use the typeof function to find the type of test_status.

2.7 μs
Enum InfectionStatus:
S = 0
I = 1
R = 2
45.0 ns

👉 Convert x to an integer using the Integer function. What value does it have? What values do I and R have?

2.8 μs
1.8 ms

Exercise 2.2

For each agent we want to keep track of its infection status and the number of other agents that it infects during the simulation. A good solution for this is to define a new type Agent to hold all of the information for one agent, as follows:

5.1 μs
Agent
1.4 ms

When you define a new type like this, Julia automatically defines one or more constructors, which are methods of a generic function with the same name as the type. These are used to create objects of that type.

👉 Use the methods function to check how many constructors are pre-defined for the Agent type.

5.0 μs
# 3 methods for type constructor:
894 μs

👉 Create an agent test_agent with status S and num_infected equal to 0.

3.3 μs
test_agent
1.4 ms

👉 For convenience, define a new constructor (i.e. a new method for the function) that takes no arguments and creates an Agent with status S and number infected 0, by calling one of the default constructors that Julia creates. This new method lives outside (not inside) the definition of the struct. (It is called an outer constructor.)

(In Pluto, multiple methods for the same function need to be combined in a single cell using a begin end block.)

Let's check that the new method works correctly. How many methods does the constructor have now?

6.6 μs
30.2 μs

Exercise 2.3

👉 Write functions set_status!(a) and set_num_infected!(a) which modify the respective fields of an Agent. Check that they work. [Note the bang ("!") at the end of the function names to signify that these functions modify their argument.]

6.0 μs
set_status! (generic function with 1 method)
15.6 μs
set_num_infected! (generic function with 1 method)
16.4 μs

Got it!

You got the right answer!

5.9 ms

👉 We will also need functions is_susceptible and is_infected that check if a given agent is in those respective states.

3.6 μs
is_susceptible (generic function with 1 method)
19.1 μs
is_infected (generic function with 1 method)
16.1 μs

Got it!

Yay ❤

5.2 ms

Exericse 2.4

👉 Write a function generate_agents(N) that returns a vector of N freshly created Agents. They should all be initially susceptible, except one, chosen at random (i.e. uniformly), who is infectious.

9.0 μs
generate_agents (generic function with 1 method)
29.1 μs
3.3 μs

Got it!

You got the right answer!

151 ms

We will also need types representing different infections.

Let's define an (immutable) struct called InfectionRecovery with parameters p_infection and p_recovery. We will make it a subtype of an abstract AbstractInfection type, because we will define more infection types later.

7.1 μs
4.2 μs
678 μs

Exercise 2.5

👉 Write a function interact! that takes an affected agent of type Agent, an source of type Agent and an infection of type InfectionRecovery. It implements a single (one-sided) interaction between two agents:

  • If the agent is susceptible and the source is infectious, then the source infects our agent with the given infection probability. If the source successfully infects the other agent, then its num_infected record must be updated.

  • If the agent is infected then it recovers with the relevant probability.

  • Otherwise, nothing happens.

496 μs
interact! (generic function with 2 methods)
39.8 μs

Play around with the test case below to test your function! Try changing the definitions of agent, source and infection. Since we are working with randomness, you might want to run the cell multiple times.

2.8 μs
8.2 ms

Got it!

Your function treats the susceptible agent case correctly!

3.5 ms

Got it!

Your function treats the infectious agent case correctly!

33.0 μs

Got it!

Your function treats the recovered agent case correctly!

33.0 μs

Exercise 3: Agent-based model for an epidemic outbreak – Monte Carlo simulation

In this exercise we will build on Exercise 2 to write a Monte Carlo simulation of how an infection propagates in a population.

Make sure to re-use the functions that we have already written, and introduce new ones if they are helpful! Short functions make it easier to understand what the function does and build up new functionality piece by piece.

You should not use any global variables inside the functions: Each function must accept as arguments all the information it requires to carry out its task. You need to think carefully about what the information each function requires.

Exercise 3.1

👉 Write a function step! that takes a vector of Agents and an infection of type InfectionRecovery. It implements a single step of the infection dynamics as follows:

  • Choose two random agents: an agent and a source.

  • Apply interact!(agent, source, infection).

  • Return agents.

10.4 μs
step! (generic function with 1 method)
23.5 μs

👉 Write a function sweep!. It runs step! N times, where N is the number of agents. Thus each agent acts, on average, once per sweep; a sweep is thus the unit of time in our Monte Carlo simulation.

2.7 μs
sweep! (generic function with 1 method)
23.4 μs

👉 Write a function simulation that does the following:

  1. Generate the N agents.

  2. Run sweep! a number T of times. Calculate and store the total number of agents with each status at each step in variables S_counts, I_counts and R_counts.

  3. Return the vectors S_counts, I_counts and R_counts in a named tuple, with keys S, I and R.

You've seen an example of named tuples before: the student variable at the top of the notebook!

Feel free to store the counts in a different way, as long as the return type is the same.

8.7 μs
stat_frequencies (generic function with 1 method)
30.6 μs
simulation (generic function with 1 method)
40.7 μs
95.7 ms
36.0 ms
355 ms
12.7 ms

We used a let block in this cell to group multiple expressions together, but how is it different from begin or function?

function vs. begin vs. let

Writing functions is a way to group multiple expressions (i.e. lines of code) together into a mini-program. Note the following about functions:

  • A function always returns one object.[1] This object can be given explicitly by writing return x, or implicitly: Julia functions always return the result of the last expression by default. So f(x) = x+2 is the same as f(x) = return x+2.

  • Variables defined inside a function are not accessible outside the function. We say that function bodies have a local scope. This helps to keep your program easy to read and write: if you define a local variable, then you don't need to worry about it in the rest of the notebook.

There are two other ways to group epxressions together that you might have seen before: begin and let.

begin

begin will group expressions together, and it takes the value of its last subexpression.

We use it in this notebook when we want multiple expressions to always run together.

let

let also groups multiple expressions together into one, but variables defined inside of it are local: they don't affect code outside of the block. So like begin, it is just a block of code, but like function, it has a local variable scope.

We use it when we want to define some local (temporary) variables to produce a complicated result, without interfering with other cells. Pluto allows only one definition per global variable of the same name, but you can define local variables with the same names whenever you wish!

1

Even a function like

f(x) = return

returns one object: the object nothing — try it out!

4.2 μs

Exercise 3.2

Alright! Every time that we run the simulation, we get slightly different results, because it is based on randomness. By running the simulation a number of times, you start to get an idea of the mean behaviour of our model. This is the essence of a Monte Carlo method! You use computer-generated randomness to generate samples.

Instead of pressing the button many times, let's have the computer repeat the simulation. In the next cells, we run your simulation num_simulations=20 times with N=100, pinfection=0.02, pinfection=0.002 and T=1000.

Every single simulation returns a named tuple with the status counts, so the result of multiple simulations will be an array of those. Have a look inside the result, simulations, and make sure that its structure is clear.

10.1 μs
repeat_simulations (generic function with 1 method)
38.9 μs
simulations
397 ms

In the cell below, we plot the evolution of the number of I individuals as a function of time for each of the simulations on the same plot using transparency (alpha=0.5 inside the plot command).

2.9 μs
50.9 ms

👉 Calculate the mean number of infectious agents of our simulations for each time step. Add it to the plot using a heavier line (lw=3 for "linewidth") by modifying the cell above.

Check the answer yourself: does your curve follow the average trend?

Hint

This exercise requires some creative juggling with arrays, anonymous functions, maps, or whatever you see fit!

16.8 μs

👉 Write a function sir_mean_plot that returns a plot of the means of S, I and R as a function of time on a single graph.

2.8 μs
sir_mean_plot (generic function with 1 method)
63.6 μs
87.5 ms

👉 Allow pinfection and precovery to be changed interactively and find parameter values for which you observe an epidemic outbreak.

2.8 μs
0.001
51.4 μs
0.001
71.3 μs

👉 Write a function sir_mean_error_plot that does the same as sir_mean_plot, which also computes the standard deviation σ of S, I, R at each step. Add this to the plot using error bars, using the option yerr=σ in the plot command; use transparency.

This should confirm that the distribution of I at each step is pretty wide!

5.5 μs
sir_mean_error_plot (generic function with 1 method)
72.9 μs
387 ms

Exercise 3.3

👉 Plot the probability distribution of num_infected. Does it have a recognisable shape? (Feel free to increase the number of agents in order to get better statistics.)

3.8 μs
31.0 ns

Exercse 3.4

👉 What are three simple ways in which you could characterise the magnitude (size) of the epidemic outbreak? Find approximate values of these quantities for one of the runs of your simulation.

5.5 μs
31.0 ns

Exercise 4: Reinfection

In this exercise we will re-use our simulation infrastructure to study the dynamics of a different type of infection: there is no immunity, and hence no "recovery" rather, susceptible individuals may now be re-infected

Exercise 4.1

👉 Make a new infection type Reinfection. This has the same two fields as InfectionRecovery (p_infection and p_recovery). However, "recovery" now means "becomes susceptible again", instead of "moves to the R class.

This new type Reinfection should also be a subtype of AbstractInfection. This allows us to reuse our previous functions, which are defined for the abstract supertype.

10.3 μs
647 μs

👉 Make a new method for the interact! function that accepts the new infection type as argument, reusing as much functionality as possible from the previous version.

Write it in the same cell as our previous interact! method, and use a begin block to group the two definitions together.

10.7 μs

Exercise 4.2

👉 Run the simulation 20 times and plot I as a function of time for each one, together with the mean over the 20 simulations (as you did in the previous exercises).

Note that you should be able to re-use the sweep! and simulation functions , since those should be sufficiently generic to work with the new step! function! (Modify them if they are not.)

6.8 μs
simulations2
400 ms
11.6 ms

👉 Run the new simulation and draw I (averaged over runs) as a function of time. Is the behaviour qualitatively the same or different? Describe what you see.

2.4 μs
8.5 ms

Exercise 5 - Lecture transcript

(MIT students only) Please see the link for hw 4 transcript document on Canvas. We want each of you to correct about 400 lines, but don’t spend more than 15 minutes on it. See the the beginning of the document for more instructions. :point_right: Please mention the name of the video(s) and the line ranges you edited:

5.9 μs
lines_i_edited

Abstraction, lines 1-219

Array Basics, lines 1-137

Course Intro, lines 1-44

(for example)

10.1 μs





45.0 ns
8.4 μs

Function library

Just some helper functions used in the notebook.

3.0 μs
hint (generic function with 1 method)
27.6 μs
almost (generic function with 1 method)
29.3 μs
still_missing (generic function with 2 methods)
23.8 μs
keep_working (generic function with 2 methods)
22.1 μs
yays
11.1 ms
correct (generic function with 2 methods)
47.7 μs
not_defined (generic function with 1 method)
40.1 μs
1.9 ms